You were introduced to these mysterious
properties called PartitionKey and RowKey, but you didn’t really learn much about
them. To understand partitioning, it is useful to have a mental model of
how the Azure Table service works. Azure tables give developers scalable
storage, which means developers should be able to dump terabytes of data
if necessary. All of this data must be naturally hosted on multiple
machines. The question then becomes, “How do you partition data across
these nodes?”
Partitioning has a few key implications. Picking the right
partitioning is critical, or you could wind up with too much data on one
node (bad), or related data on different nodes (really bad). Entities
with the same partition key will share the same partition, and are
guaranteed to be stored together. Partitioning is the unit of
distribution, and is primarily meant for scalability.
This does not mean, however, that each partition is located on a
separate node. The system automatically balances your partitions based
on size, traffic, and other factors. For example, several partitions
might start off together on the same node, but get moved away when one
partition grows in size. In any case, you should never depend on
separate partitions being together. On the other
hand, you can always depend on entities within the same partition being
together.
When users hear that entities with the same partition key are
stored together, they wonder whether they’ll run out of space when the
actual physical machine holding the partition runs out of space. The
answer is “no”—you cannot run out of space in a single partition.
Without revealing some of the “secret sauce” behind the storage
system, note that though terms such as node and
partition are used here, they don’t necessarily
mean “the same machine.” Some magic takes place under the covers to
ensure that data in the same partition can be queried and retrieved
together really, really quickly. However, it might be useful to have a mental model of
“one partition = one machine with a disk of near infinite
capacity.” It makes visualization and whiteboard drawing
much easier. |
Partitioning (or, to be more precise, specifying the right
partition key in the query) is the biggest factor affecting query
performance. The general principle behind fast queries in any storage
system is to structure your data and query in such a way that the
storage system must do a minimal amount of traversal.
In the database world, this typically means using indexes. Without
indexes to help the query processor find the right rows, every query
would result in a slow table scan across all rows. The same principle
holds true for Azure tables. You must partition your data and queries to
make the storage system do the least amount of traversal possible. In
general, you must make your queries as specific as possible.
Consider a simple table such as the one shown in Table 1.
Table 1. Superhero table
PartitionKey (Comic
universe) | RowKey (Character
name) | Property 3
(Superpower) | Property N (First
appeared in) |
---|
Marvel | Cyclops | Heat Ray | The
X-Men (#1) |
Marvel | Wolverine | Healing +
Adamantium Skeleton | The Incredible
Hulk (#180) |
DC | Superman | Flight, super-strength, and so
on | Action
Comics (#1) |
DC | Batman | None | Detective
Comics (#2) |
DC | Lex Luthor | None | Action
Comics (#24) |
DC | Flash | Super speed | Flash
Comics (#1) |
Now, with the entries in Table 10-5 in
mind, let’s walk through a few sample queries (specified in pseudocode)
to see how partitioning can affect performance. Let’s assume that each
partition is hosted on a separate storage node.
First, let’s find entities with the following pseudocode:
partition = "DC" and RowKey="Flash"
This is the fastest kind of query. In this case, both the
partition key and the row key are specified. The system knows which
partition to go to, and queries that single partition for the specified
row.
Note:
Always try to specify the partition key in your queries. This
helps query performance because the storage system knows exactly which
node to query. When the partition key isn’t specified, the storage
system must query all the partitions in the system, which obviously
doesn’t result in as fast a result. Whether you can specify partition
keys in all your keys depends on how you partition your data.
Next, let’s find entities with the following pseudocode:
PartitionKey="DC" and SuperPower=None
In this query, the partition key is specified, but a nonrow key
attribute is filtered upon. This is fast (since the partition key is
specified), but isn’t as fast as when the row key is specified.
Finally, let’s find entities with the following pseudocode:
SuperPower=None
This is the slowest kind of query. In this case, the storage
system must query each of the table’s partitions, and then walk through
each entity in the partition. You should avoid queries such as this that
don’t specify any of the keys.
In a traditional RDBMS, you would specify an index to speed up
such queries. However, Azure’s Table service doesn’t support these
“secondary indexes.” (The row key is considered to be the primary
index.) You can emulate the behavior of these secondary indexes
yourself, though, by creating another table that maps these properties
to the rows that contain them. You’ll see an example of how to do this
later.
Note:
Secondary indexes are part of the road map for Azure tables, and
you should see them in a future release. At that time, you won’t need
these workarounds.
This approach has a few downsides. First, you’ll wind up storing
more data in the system because of the extra tables. Second, you’ll be
doing more I/O in the write/update code, which could affect
performance.
You should keep a couple of considerations in mind that influence
partitioning:
Ensuring locality of reference
In the previous query example, you saw how it is much faster
to query only a single partition. Imagine a scenario in which your
query must deal with different types of data. Ensuring that the
data has the same partition key means the query can return results
from just one partition.
Avoiding hot partitions
The storage system redistributes and load-balances traffic.
However, queries and updates to a partition are served from the
same partition. It might be wise to ensure that hot data is split
across partitions to avoid putting a lot of stress on one node. In
general, it’s not necessary for you to know whether to do this.
Azure’s Table service can serve out data from a partition quickly,
and can take quite a bit of load on one partition. This is a
concern where applications have read-access rates that are very,
very high. Running stress tests is a good way to identify whether
your application needs this.
Note:
You can create as many partitions as you like. In fact,
the more partitions you have, the better Azure’s Table service
can spread out your data in case of heavy load. Like all things
in life, this is a trade-off. Aggregate queries that span
multiple partitions will see a drop in performance.
1. Picking the right partition key
Like a ritual, designing a database schema follows some set patterns.
In short, you “model” the data you want to store, and then go about
normalizing this schema. In the Windows Azure world, you start the
same way, but you give a lot of importance to the queries that your
application will be executing. In fact, it might be a good idea to
begin with a list of queries that you know need good performance, and
use that as the starting point to build out the table schema and
partitioning scheme.
Follow these steps:
Start with the key queries that your system will execute.
Prioritize them in order of importance and performance required.
For example, a query to show the contents of your shopping cart
must be much faster than a query to show a rarely generated
report.
Using the previous key queries, create your table schema.
Ensure that the partition key can be specified in
performance-sensitive queries. Estimate how much data you expect
in each table and each partition. If one partition winds up with
too much data (for example, if it is an order of magnitude greater
than any other partition), make your partitioning more granular by
concatenating other properties into the partitioning key.
For example, if you’re building a web log analyzer and
storing the URL as the partition key hurts you with very popular
URLs, you can put date ranges in the partition key. This splits
the data so that, for example, each partition contains data for a
URL for only a specific day.
Pick a unique identifier for the RowKey. Row keys must be unique within the
partition. For example, in Table 10-5, you
used the Superhero’s name as the RowKey, since it was unique within the
partition.
Of course, hindsight is 20/20. If you find that your
partitioning scheme isn’t working well, you might need to change it
on-the-fly. In the previous web log analyzer example, you could do
that by making the size of the date range dynamic. If the data size on
a particular day was huge (say, over the weekend), you could switch to
using an hourly range only for weekends. Your application must be
aware of this dynamic partitioning, and this should be built in from
the start.
In general, partition keys are considered the unit of distribution/scalability,
while row keys are meant for uniqueness. If the key for your data
model has only one property, you should use that as your partition
key (an empty row key would suffice) and have one row per partition.
If your key has more than one property, distribute the properties
between the partition key and the row key to get multiple rows per
partition |
2. Testing the theory
You’ve just seen the impact of specifying versus not specifying
a partition key, or a query executing on one partition versus a query
executing on multiple partitions. Now, let’s build some home-grown
benchmarks to prove these points.
Warning:
These benchmarks were run from a network with multiple layers
of proxies in the middle (and several hundred miles) between the
machine and the cloud, whereas when you run in the cloud, you’ll be
running in the same data center. Also, optimizations were not
performed as they would have been in a production application. You
should look at the relative difference between the following
numbers, rather than the actual numbers themselves. Running the same
unoptimized code in the cloud gives vastly different numbers—around
350 ms for retrieving 1,000 rows.
Example 1 shows a simple entity with a
partition key and a row key (which doubles up as the Data member). You also write a vanilla
DataContext to wrap around the
entity. The entity isn’t interesting by itself. The interesting part
is how you partition the data.
Example 1. Test entity
public class TestEntity : TableServiceEntity {
public TestEntity(string id, string data) : base(id, data) { ID = id; Data = data; }
//Parameter-less constructor always needed for // ADO.NET Data Services public TestEntity() { }
public string ID { get; set; } public string Data { get; set; } }
public class TestDataServiceContext : TableServiceContext { public TestDataServiceContext (string baseAddress, StorageCredentials credentials): base(baseAddress, credentials) {}
internal const string TestTableName = "TestTable";
public IQueryable<TestEntity> TestTable { get { return this.CreateQuery<TestEntity>(TestTableName); } } }
|
Though it is not shown in Example 10-18, you
also create an exact copy of these two classes with the number
2 appended to the type names
(TestEntity2 and TestDataServiceContext2). You will try out
two different partitioning schemes on TestEntity1 and TestEntity2.
For TestEntity, let’s insert
100,000 rows, as shown in Example 2. Let’s create
them all in the same partition (with partition key 1). The storage system will place all the
entities on the same storage node.
Example 2. Inserting 100,000 rows into the same partition
CloudStorageAccount.Parse(ConfigurationSettings.AppSettings ["DataConnectionString"]); var svc = new TestDataServiceContext(account.TableEndpoint.ToString(), account.Credentials);
for (int i = 1; i < 100000; i++) { svc.AddObject("TestTable", new TestEntity("1", "RowKey_" + i.ToString() ); }
|
For TestEntity2, let’s insert
100,000 rows, but let’s split them among 1,000 different partitions.
You loop from 1 to 100,000 and modify the loop counter by 1,000 to get
evenly spaced partitions. Example 3 shows how to do
this.
Example 3. Inserting 100,000 rows in 1,000 partitions
CloudStorageAccount.Parse(ConfigurationSettings.AppSettings ["DataConnectionString"]); var svc = new TestDataServiceContext(account.TableEndpoint.ToString(), account.Credentials);
for (int i = 1; i < 100000; i++) { svc.AddObject("TestTable2", new TestEntity2((i % 1000).ToString(), "RowKey_" + i.ToString())); }
|
Now, let’s run three different queries. The first query will be
against the 100,000 rows of TestEntity that are in the same partition.
The second will be against the 100,000 rows of TestEntity2, but with no partition key
specified. The third will be the same as the second, but with the
partition key specified in the query. Example 4 shows the code for the
three.
Example 4. Three different queries
// Single partition query var query = from entity in svc.CreateQuery<TestEntity>("TestTable") where entity.PartitionKey == "1" && entity.RowKey == "RowKey_55000" select entity;
//Multiple partition query - no partition key specified var query2 = from entity in svc2.CreateQuery<TestEntity2>("TestTable2") where entity.RowKey == "RowKey_55553" select entity;
//Multiple partition query - partition key specified in query var query3 = from entity in svc2.CreateQuery<TestEntity2>("TestTable2") where entity.PartitionKey == "553" && entity.RowKey == "RowKey_55553" select entity;
|
In each of these queries, let’s retrieve one entity using the
FirstOrDefault method. Table 2 shows the relative numbers
for 1,000 iterations of each of these queries.
Table 2. Query performance comparison
Query
type | Time for 1,000
iterations (in seconds) |
---|
Single
partition | 26 |
Multiple partition—no
partition key specified | 453 |
Multiple
partition—partition key specified | 25 |
The results speak for themselves. Going to a single partition
(either because all your data is stored in it or because you specified
it in the query) is always much faster than not specifying the
partition key. Of course, using only a single partition has several
downsides, as discussed earlier. In general, query times are not
affected by how they are partitioned as much as they are by whether
the partition key is specified in the query.
Warning:
If you want to do similar tests, insert the following
configuration setting into your configuration file (either App.config or web.config):
<system.net>
<settings>
<servicePointManager expect100Continue="false"
useNagleAlgorithm="false" />
</settings>
</system.net>
The first configuration setting deals with a bug in .NET where
every request is sent with an Expect:100-Continue. If you’re sure that
your client handles errors from the server well, you can turn this
off.
The second configuration setting is an issue if you do several
synchronous updates close together like this benchmark program does.
Since Delayed ACKs are turned on in the server, the client winds up
waiting for much longer than it should when the Nagle algorithm is
turned on.